A basic concept in (statistical) programming is called a variable.
A variable allows you to store a value (e.g. 4) or an object (e.g. a function description) in R. You can then later use this variable’s name to easily access the value or the object that is stored within this variable.
Save information as an R objetc with the greater than sign followed by a minus, e.g. an arrow: <-
#name of new objetc assignment operator, "gets" information to store in the objetc
foo <- 42
Save output of one function as an R objetc to use in a second function
foo <- round(3.1415) + 1
foo
[1] 4
factorial(foo)
[1] 24
You can remove an objetc with rm
fac_foo<-factorial(foo)
fac_foo
[1] 24
rm(foo)
rm(fac_foo)
mean<-mean(rnorm(100))
mean
[1] -0.06199669
?mean
rm(mean)
pi
[1] 3.141593
pi<-1
pi
[1] 1
rm(pi)
pi
[1] 3.141593
You can save more than a single number in an objetc by creating a vector, matrix, or array.
WorldPhones
N.Amer Europe Asia S.Amer Oceania Africa Mid.Amer
1951 45939 21574 2876 1815 1646 89 555
1956 60423 29990 4708 2568 2366 1411 733
1957 64721 32510 5230 2695 2526 1546 773
1958 68484 35218 6662 2845 2691 1663 836
1959 71799 37598 6856 3000 2868 1769 911
1960 76036 40341 8220 3145 3054 1905 1008
1961 79831 43173 9053 3338 3224 2005 1076
class(WorldPhones)
[1] "matrix"
Combine multiple elements into one dimensional array.
Create with the c function.
vec<-c(1,2,3,10,100)
vec
[1] 1 2 3 10 100
Combine multiple elements into a two dimensional array.
Create with the matrix function.
#vector of elements to go in the matrix number of rows for matrix
mat<-matrix(data = c(1,2,3,4,5,6), nrow = 2)
mat
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
#vector of elements to go in the matrix number of rows for matrix
mat<-matrix(data = c(1,2,3,4,5,6), nrow = 3)
mat
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
#vector of elements to go in the matrix number of cols for matrix
#rbind(c(1,2,3,4,5,6),c(1,2,3,4,5,6)*2)
mat<-matrix(c(1,2,3,4,5,6),ncol=2)
mat
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
vec2<-vec
vec2<-vec + 4
vec; vec2
[1] 1 2 3 10 100
[1] 5 6 7 14 104
vec * 4 ; vec2 * 4
[1] 4 8 12 40 400
[1] 20 24 28 56 416
vec * vec ; vec2 * vec2; c(23,vec) * c(vec2,2);vec;vec2
[1] 1 4 9 100 10000
[1] 25 36 49 196 10816
[1] 115 6 14 42 1040 200
[1] 1 2 3 10 100
[1] 5 6 7 14 104
inner
mat<-matrix(c(1,2,3,4,5,6,7,8,9),ncol=3)
vec; vec %*% vec; mat;mat %*% mat
[1] 1 2 3 10 100
[,1]
[1,] 10114
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
[,1] [,2] [,3]
[1,] 30 66 102
[2,] 36 81 126
[3,] 42 96 150
outer
mat<-matrix(c(1,2,3,4,5,6),ncol=2)
vec; vec %o% vec; mat; mat %o% mat; mat %o% vec
[1] 1 2 3 10 100
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 10 100
[2,] 2 4 6 20 200
[3,] 3 6 9 30 300
[4,] 10 20 30 100 1000
[5,] 100 200 300 1000 10000
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
, , 1, 1
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
, , 2, 1
[,1] [,2]
[1,] 2 8
[2,] 4 10
[3,] 6 12
, , 3, 1
[,1] [,2]
[1,] 3 12
[2,] 6 15
[3,] 9 18
, , 1, 2
[,1] [,2]
[1,] 4 16
[2,] 8 20
[3,] 12 24
, , 2, 2
[,1] [,2]
[1,] 5 20
[2,] 10 25
[3,] 15 30
, , 3, 2
[,1] [,2]
[1,] 6 24
[2,] 12 30
[3,] 18 36
, , 1
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
, , 2
[,1] [,2]
[1,] 2 8
[2,] 4 10
[3,] 6 12
, , 3
[,1] [,2]
[1,] 3 12
[2,] 6 15
[3,] 9 18
, , 4
[,1] [,2]
[1,] 10 40
[2,] 20 50
[3,] 30 60
, , 5
[,1] [,2]
[1,] 100 400
[2,] 200 500
[3,] 300 600
transpose
mat
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
t(mat)
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
R can recognize different types of data.
We’ll look at four basic types:
12+4
[1] 16
3000000
[1] 3e+06
class(0.000001)
[1] "numeric"
print("hello")
[1] "hello"
print('hello')
[1] "hello"
# print("hello') ESTO ESTÁ MAL
class("hello")
[1] "character"
"12+4"
[1] "12+4"
class("12+4")
[1] "character"
#"hello" + "world"
nchar("hello")
[1] 5
paste("hello","world",sep=",");paste("hello","world",sep=" 2342eafdsghsIJGBJmdxfghvb ")
[1] "hello,world"
[1] "hello 2342eafdsghsIJGBJmdxfghvb world"
paste("hello","world",sep=",");paste("como","estas",sep="_");
[1] "hello,world"
[1] "como_estas"
paste(paste("hello","world",sep=","),paste("como","estas",sep="_"),2,sep="::"); paste(paste("hello","world",sep=","),paste("como","estas",sep="_"),"2",sep="::"); paste("hola","sin","espacios",sep="");paste0("hola","sin","espacios")
[1] "hello,world::como_estas::2"
[1] "hello,world::como_estas::2"
[1] "holasinespacios"
[1] "holasinespacios"
wich are numbers?
1; "1"; "one"
[1] 1
[1] "1"
[1] "one"
c(1, "1","one")
[1] "1" "1" "one"
TRUE or FALSE - R’s form of binary data. - Useful for logical tests. - And Very Useful when whe want to filter datasets…
3<4
[1] TRUE
x <- c(1, 2, 3, 4, 5)
x
[1] 1 2 3 4 5
x > 3
[1] FALSE FALSE FALSE TRUE TRUE
x >= 3
[1] FALSE FALSE TRUE TRUE TRUE
x < 3
[1] TRUE TRUE FALSE FALSE FALSE
x <= 3
[1] TRUE TRUE TRUE FALSE FALSE
x == 3
[1] FALSE FALSE TRUE FALSE FALSE
x != 3
[1] TRUE TRUE FALSE TRUE TRUE
x = 3
c(3,4,5,6) %in% c(2, 3, 4)
[1] TRUE TRUE FALSE FALSE
titanic2<-read.csv("data/titanic2.csv",header = T)
titanic2
library(dplyr)
titanic2%>%
filter(age=="adult")
titanic2%>%
filter(class=="1st")
titanic2%>%
filter(male<mean(male))
titanic2%>%
filter(female>=mean(female))
unique(titanic2$class)
[1] 1st 2nd 3rd Crew
Levels: 1st 2nd 3rd Crew
unique(titanic2$age)
[1] adult child
Levels: adult child
titanic2%>%
filter(fate!="survived" &
as.numeric(class)>=3 &
as.numeric(age)<2)
titanic2%>%
filter(fate=="survived" &
as.numeric(class)>=3 &
as.numeric(age)<2)
titanic2%>%
filter(fate=="survived" &
class%in%c("1st","3rd"))
titanic2%>%
filter(fate!="survived" &
class%in%c("1st","3rd"))
class(TRUE)
[1] "logical"
class(T) ; class(F)
[1] "logical"
[1] "logical"
class(3<4)
[1] "logical"
R’s form of categorical data. Saved as an integer with a set of labels (e.g. levels)
fac<-factor(c("a","b","c"))
fac
[1] a b c
Levels: a b c
class(fac)
[1] "factor"
One proof that factor makes sense
titanic2
library(ggplot2)
gg<-ggplot(titanic2,aes(x=class,y=age))
gg + geom_point(aes(size=male))
gg + geom_point(aes(color=fate))
gg01<-gg + geom_point(aes(color=fate,size=male+female))
gg01
library(plotly)
ggplotly(gg01)
<-. Alternatively, you can use =, but <- is widely preferred in the R community.# Assign a value to the variables my_apples and my_oranges
my_apples <- 5
my_oranges <- 6
# Add these two variables together
my_apples + my_oranges
[1] 11
# Create the variable my_fruit
my_fruit <- my_apples + my_oranges
Be careful with the operations between different types/classes of objects
# Assign a value to the variable my_apples
my_apples <- 5
# Fix the assignment of my_oranges
my_oranges <- "six"
# Create the variable my_fruit and print it out
# my_fruit <- my_apples + my_oranges
class(my_oranges)
[1] "character"
class(my_oranges)
[1] "character"
So, in general, it’s a good idea to check that the objetcs that are opperating between each other, are of the same class/type or we have to be conscients that sometimes, if the types are not equals but they are “almost operables”, R will change at least one of them to a type that make both be “totaly operables”.
There could be some warnings about this… it could be a good idea to knoe a little more about the data types that will be jumping in at our work.
On a Vector…
vec<-c(1,"R","TRUE")
class(vec)
[1] "character"
vec
[1] "1" "R" "TRUE"
Sure a Matrix will do it…
matriz_de_Camilo<-matrix(cbind(c(1,2,3),
c("R","S","T"),
c(TRUE,FALSE,TRUE)),ncol=3)
class(matriz_de_Camilo)
[1] "matrix"
matriz_de_Camilo
[,1] [,2] [,3]
[1,] "1" "R" "TRUE"
[2,] "2" "S" "FALSE"
[3,] "3" "T" "TRUE"
for(row_tmp in 1:nrow(matriz_de_Camilo)){
print(class(matriz_de_Camilo[row_tmp,]))
}
[1] "character"
[1] "character"
[1] "character"
for(col_tmp in 1:ncol(matriz_de_Camilo)){
print(class(matriz_de_Camilo[,col_tmp]))
}
[1] "character"
[1] "character"
[1] "character"
matriz_de_Camilo
[,1] [,2] [,3]
[1,] "1" "R" "TRUE"
[2,] "2" "S" "FALSE"
[3,] "3" "T" "TRUE"
What the … is R doing?!
Always remember Coercion
So, isn’t there any way we I can do it?
Really? ;(
There is a way… Thank God for the data frames…
And for the lists…
df<-data.frame(c(1,2,3),
c("R","S","T"),
c(TRUE,FALSE,TRUE))
df
df_de_Camilo<-as.data.frame(matriz_de_Camilo)
df_de_Camilo
as.data.frame(matriz_de_Camilo,stringsAsFactors = F)
When we read a .csv file and store it on a object, that will be a data.frame class
titanic2 ## remember how we got this object: read.csv("data/titanic2.csv",header = T)
class(titanic2)
[1] "data.frame"
And now, just because it’s worthy…
There are some types of objects very similar to the data frames but that are not exactly one of those
They came from the package dplyr (one of my favorites) and its class is called tibble (nickname: data_frame) instead of data.frame
Example:
library(nycflights13)
flights
class(flights)
[1] "tbl_df" "tbl" "data.frame"
Print on console:
titanic2 &, afterwardsflights
Do you see any difference?
nlst<-list(one=1,two=2,many=c(1,2,3))
nlst
$one
[1] 1
$two
[1] 2
$many
[1] 1 2 3
nlst<-list("Eduardo"=df_de_Camilo,"Camilo"=matriz_de_Camilo,"Carlos"=c(T,FALSE,TRUE,F))
#Print directly on console
nlst
$Eduardo
$Camilo
[,1] [,2] [,3]
[1,] "1" "R" "TRUE"
[2,] "2" "S" "FALSE"
[3,] "3" "T" "TRUE"
$Carlos
[1] TRUE FALSE TRUE FALSE
nlst$Eduardo
#Print directly on console
nlst[1]
$Eduardo
NA
#Print directly on console
nlst[[1]]
nlst02<-list("Eduardo"=df_de_Camilo,"Camilo"=matriz_de_Camilo,"Carlos"=c(T,FALSE,TRUE,F),"unalistadentrodeunalista"=nlst)
#Print directly on console
nlst02
$Eduardo
$Camilo
[,1] [,2] [,3]
[1,] "1" "R" "TRUE"
[2,] "2" "S" "FALSE"
[3,] "3" "T" "TRUE"
$Carlos
[1] TRUE FALSE TRUE FALSE
$unalistadentrodeunalista
$unalistadentrodeunalista$Eduardo
$unalistadentrodeunalista$Camilo
[,1] [,2] [,3]
[1,] "1" "R" "TRUE"
[2,] "2" "S" "FALSE"
[3,] "3" "T" "TRUE"
$unalistadentrodeunalista$Carlos
[1] TRUE FALSE TRUE FALSE